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COMPACT VISUAL SUMMARIES USING 
SUPERHISTOGRAMS AND FRAME SIGNATURES 

RELATED APPLICATION 

This patent application is related to co-pending United States 
Patent Application No. 09/116,769 filed July 16, 1998 by Martino et 
al . entitled U A Histogram Method for Characterizing Video Content." 
The disclosure in United States Patent Application No. 09/116,769 
is hereby incorporated by reference in the present patent 
application as if fully set forth herein. 

TECHNICAL FIELD OF THE INVENTION 

The present invention is directed, in general, to the creation 
of visual summaries of video material, more specifically, to a 
system and method that creates compact visual summaries using 
superhistograms and frame signatures. 
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BACKGROUND OF THE INVENTION 

A wide variety of video recorders are available in the 
marketplace. Most people own, or are familiar with, a video 

5 cassette recorder (VCR) . A video cassette recorder records video 
programs on magnetic cassette tapes. More recently, video 
recorders have appeared in the market that use computer magnetic 
hard disks rather than magnetic cassette tapes to store video 

-J3 programs. For example, the ReplayTV™ recorder and the TiVO™ 
1(JR recorder digitally record television programs on hard disk drives 

Qj using, for example, an MPEG video compression standard. 
Additionally, some video recorders may record on a 

L readable/writable, digital versatile disk (DVD) rather than a 

15jJ The widespread use of video recorders has generated and 

H= continues to generate large volumes of video materials. 
The existence of large volumes of video materials has created a 
demand for systems that are capable of creating summaries of video 
materials. Summaries of video materials can be visual summaries, 

20 audio summaries, or textual summaries, or combinations of visual, 
audio and textual summaries. Presently existing methods for 
creating visual summaries generally involve extracting keyframes 
from the video material. An improved method for creating visual 
summaries involves extracting frame signatures from the keyframes 
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and then using the frame signatures to filter the keyframes. 
However, these methods still leave a large number of keyframes 
remaining after the filtering process has been completed. 

Many presently existing devices have limited storage capacity. 
5 For example, personal digital assistants (PDAs) and other similar 
types of devices are not able to store large amounts of data. Such 
devices cannot effectively use visual summaries that contain a 
large number of keyframes. 

There is therefore a need for an improved system and method 
loo that is capable of creating a compact visual summary. There is a 
0 1 need for an improved system and method that is capable of 
W selectively creating a compact visual summary that contains fewer 

q 

=P keyframes than prior art visual summaries contain. 

3 

Q 
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SUMMARY OF THE INVENTION 

It is an object of the present invention to provide an 
improved system and method for creating compact visual summaries. 

It is also an object of the present invention to provide an 
improved system and method for creating compact visual summaries 
using superhistograms and frame signatures. 

In one advantageous embodiment, the apparatus of the present 
invention comprises a visual summary controller that is capable of 
(1) receiving keyframes of video material, and (2) extracting frame 
signatures from the keyframes, and (3) using the frame signatures 
to create superhistograms from the keyframes, and (4) using the 
frame signatures and the superhistograms to create a compact visual 
summary of the video material. The visual summary controller uses 
the superhistograms to filter and cluster the keyframes, and adds 
representative frames from the clustered keyframes to the compact 
visual summary. 

The visual summary controller also comprises a visual summary 
retrieval module that retrieves a visual summary from storage and 
displays the visual summary in response to a user request. 

The foregoing has outlined rather broadly the features and 
technical advantages of the present invention so that those skilled 
in the art may better understand the detailed description of the 
invention that follows. Additional features and advantages of the 
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invention will be described hereinafter that form the subject of 
the claims of the invention. Those skilled in the art should 
appreciate that they may readily use the conception and the 
specific embodiment disclosed as a basis for modifying or designing 
5 other structures for carrying out the same purposes of the present 
invention. Those skilled in the art should also realize that such 
equivalent constructions do not depart from the spirit and scope of 
the invention in its broadest form. 

Before undertaking the Detailed Description of the Invention, 
lojjj it may be advantageous to set forth definitions of certain words 
if! and phrases used throughout this patent document: the terms 
j3 "include" and "comprise" and derivatives thereof, mean inclusion 
5 without limitation; the term "or," is inclusive, meaning and/or; 

Iff the phrases "associated with" and "associated therewith," as well 

fy 

15 iff as derivatives thereof, may mean to include, be included within, 
l± interconnect with, contain, be contained within, connect to or 
with, couple to or with, be communicable with, cooperate with, 
interleave, juxtapose, be proximate to, be bound to or with, have, 
have a property of, or the like; and the term "controller," 

20 "processor," or "apparatus" means any device, system or part 
thereof that controls at least one operation, such a device may be 
implemented in hardware, firmware or software, or some combination 
of at least two of the same. It should be noted that the 
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functionality associated with any particular controller may be 
centralized or distributed, whether locally or remotely. In 
particular, a controller may comprise one or more data processors, 
and associated input/output devices and memory, that execute one or 
5 more application programs and/or an operating system program. 
Definitions for certain words and phrases are provided throughout 
this patent document. Those of ordinary skill in the art should 
understand that in many, if not most instances, such definitions 
*». a PPly to prior, as well as future uses of such defined words and 
lOr: phrases. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



For a more complete understanding of the present invention, 
and the advantages thereof, reference is now made to the following 
descriptions taken in conjunction with the accompanying drawings, 
5 wherein like numbers designate like objects, and in which: 

FIGURE 1 illustrates a block diagram of an exemplary system 
for creating visual summaries comprising an advantageous embodiment 
of the present invention; 



FIGURE 2 illustrates computer software that may be used with 



10fk, an advantageous embodiment of the present invention; 



FIGURE 3 illustrates an exemplary superhistogram comprising 



j*H three family histograms; and 



s 



FIGURE 4 illustrates a flow diagram showing an advantageous 



in embodiment of a method of the present invention. 
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DETAILED DESCRIPTION OF THE INVENTION 

FIGURES 1 through 4, discussed below, and the various 
embodiments used to describe the principles of the present 
5 invention in this patent document are by way of illustration only 
and should not be construed in any way to limit the scope of the 
invention. In the description of the exemplary embodiment that 
follows, the present invention is integrated into, or is used in 
~ connection with, one particular type of system for creating visual 
ldfS summaries. Those skilled in the art will recognize that the 

ij exemplary embodiment of the present invention may easily be 

y = 

% modified for use in other types of systems for creating visual 

£ 

summaries . 

FIGURE 1 illustrates a block diagram of an exemplary 

as = 

15ip system 100 for creating visual summaries. System 100 comprises 
il video processor 110. Video processor 110 receives video signals, 
formats the video signals into frames, and identifies keyframes. 
One example of this type of video processor is described in United 
States Patent No. 6,137,544 by Dimitrova et al . issued on 

20 October 24, 2000 entitled "Significant Scene Detection and Frame 
Filtering for a Visual Indexing System." United States Patent No. 
6,13 7,544 and the disclosures therein are hereby incorporated by 
reference in the present patent application as if fully set forth 
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herein. 

Video processor 110 stores the keyframes in memory unit 120. 
Memory unit 12 0 may comprise random access memory (RAM) . Memory 
unit 12 0 may comprise a non- volatile random access memory (RAM) , 
5 such as flash memory. Memory unit 12 0 may comprise a mass storage 
data device, such as a hard disk drive (not shown) . Memory unit 
12 0 may also comprise an attached peripheral drive or removable 
disk drive (whether embedded or attached) that reads read/write 
^ DVDs or re-writable CD-ROMs. As illustrated in FIGURE 1, removable 
1CK disk drives of this type are capable of receiving and reading re- 
ffi writable CD-ROM disk 125. 

Hi 3 

^ Video processor 110 provides the keyframes to controller 130 

^ of the present invention. Controller 130 is capable of receiving 

O 

[p control signals from video processor 110 and sending control 

as s 

15yp signals to video processor 110. Controller 130 is also coupled to 

o 

y, video processor 110 through memory unit 120. As will be more fully 
described, controller 130 is capable of creating a compact visual 
summary from the keyframes received from video processor 110. 
Controller 130. creates compact visual summaries that contain fewer 
2 0 keyframes than the number of keyframes in visual summaries created 
by prior art visual summary systems. Controller 130 stores each 
compact visual summary in memory unit 120. Video processor 110, in 
response to a user request, accesses the compact visual summary 
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stored in memory unit 120 and outputs the compact visual summary to 
a display (not shown) that is viewed by the user. 

As shown in FIGURE 1, controller 130 comprises keyframe filter 
module 140, color information module 150, histogram and keyframe 
5 selection module 160, visual summary module 170, and visual summary 
retrieval module 180. As will be more fully described, keyframe 
filter module 140 extracts frame signatures from the keyframes, and 
then uses the frame signatures to filter the keyframes that 
^ controller 130 receives from video processor 110. Color information 
ION module 150 generates color information from the filtered keyframes. 
JL| Histogram and keyframe selection module 160 derives superhistograms 
^? from the color information and selects representative keyframes 
^ from the superhistograms. Visual summary module 170 then creates a 
compact visual summary using the selected keyframe images. Visual 
15|Ij summary module 170 then stores the compact visual summary in memory 
ff unit 12 0. 

Visual summary retrieval module 180, in response to a user 
request received through video processor 110, accesses those visual 
summaries that match the user request. When a match is found, 
20 visual summary retrieval module 180 identifies the appropriate 
visual summary to video processor 110. Video 110 then outputs the 
visual summary to a display (not shown) for the user. 

Controller 13 0 must identify the appropriate keyframes to 
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be used to create a compact visual summary. An advantageous 
embodiment of the present invention comprises computer software 200 
capable of identifying the appropriate keyframes to be used to 
create a compact visual summary for the video material. FIGURE 2 
illustrates a selected portion of memory unit 120 that contains 
computer software 200 of the present invention. Memory unit 12 0 
contains operating system interface program 210, keyframe filter 
application 220, color information application 230, superhistogram 
application 24 0, keyframe selection application 250, visual summary 
application 260, and visual summary storage locations 270. 

Controller 130 and computer software 200 together comprise a 
visual summary controller that is capable of carrying out the 
present invention. Under the direction of instructions in computer 
software 200 stored within memory unit 120, controller 130 creates 
a compact visual summary for the video material, stores the compact 
visual summary in visual summary storage locations 270, and replays 
the stored visual summary at the request of the user. Operating 
system interface program 210 coordinates the operation of computer 
software 200 with the operating system of controller 130. 

To create a compact visual summary, the visual summary 
controller of the present invention (comprising controller 130 and 
software 200) first executes keyframe filter application 220 to 
extract frame signatures from the keyframes that controller 130 has 
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received from video processor 110. Keyframe filter application 220 
then uses the frame signatures to filter the keyframes. The 
filtering process reduces the number of keyframes. 

Controller 13 0 then executes color information application 230 
to derive color information from the filtered keyframes. 
Controller 130 then executes superhistogram application 240 to 
derive superhistograms from the color information. Superhistogram 
application 240 operates on the principles discussed in the article 
by N. Dimitrova et al . entitled "Color Super Histograms for Video 
Representation," pp. 314-318, Volume 3, Proceedings of the IEEE 
International Conference on Image Processing, Japan, October 1999. 
This article is hereby incorporated herein by reference for all 
purposes . Superhistogram application 24 0 operates on principles 
discussed in co -pending United States Patent Application No . 
09/116,769 filed July 16, 1998 by Martino et al . entitled "A 
Histogram Method for Characterizing Video Content." The disclosure 
in United States Patent Application No. 09/116,769 is hereby 
incorporated herein by reference for all purposes. 

Superhistogram application 240 computes superhistograms by 
computing color histograms for individual shots and then merging 
the histograms into a single cumulative histogram called a family 
histogram based on a comparison measure. A family histogram 
originally represents the color union of two shots. As new frames 
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are added, the family histogram accumulates the new colors from the 
respective shots. If a histogram of a new frame differs from the 
family histograms previously constructed, then a new family 
histogram is formed. An entire television program, for example, 

5 may be represented by a few family histograms. The set of family 
histograms is ordered with respect to the length of the temporal 
segment of video that they represent. The ordered set of family 
histograms is called a superhistogram. 

q As described in the article "Color Super Histograms for Video 

lQjl Representation, histogram differences may be calculated by using 

fn any one of the following methods: (1) LI distance measure, and 

y 

gg (2) L2 distance measure, and (3) Histogram intersection, and 
s " (4) Chi Square test, and (5) Bin-wise histogram intersection. 

b 

yn Superhistogram application 240 calculates a distance measure for 

s ■» 

15[ff clustering that is equal to the histogram difference between the 

D 

m keyframes weighted by the distance between the video cuts. 

FIGURE 3 illustrates an exemplary superhistogram comprising 
three family histograms. The superhistogram illustrated in FIGURE 3 
was obtained using a Chi Square distance measure and a threshold of 
20 fifty percent (50%) . The three family histograms are denoted 
"Family 0", "Family 1", and "Family 2." In this illustrative 
example Family 0 has forty two (42) keyframes, Family 1 has 
seventeen (17) keyframes, and Family 2 has one (1) keyframe. The 
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three family histograms (together with associated information) make 
up the superhistogram. 

Table I below contains an exemplary set of final results of 
the superhistogram extraction method using automatically extracted 
5 keyframes. The method is more fully described in the article 
"Color Super Histograms for Video Representation." Table 1 shows 
the results of five histogram differencing methods (i.e., 
comparison methods) using various thresholds. As the results show, 
p the total number of families derived for smaller thresholds ranges 
1Q[Q from one hundred eighty (180) to five hundred (500) . As the 

m 

01 threshold for similarity grows, however, a smaller number of 

y 

yg families is obtained, but with longer duration (i.e., a larger 

s number of frames) . 

W\ 

fs a 
I W 

HI 
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TABLE I 



Threshold 


10% 


25% 


50% 


75% 




Method 


A 


B 


C 


A 


B 


C 


A 


B 


C 


A 


B 


C 


Histogram 
Difference 
(LI) 


185 


3274 


33 


30 


12890 


112 


8 


27897 


253 


2 


45577 


426 


Histogram 
Inter- 
cept - ion 


TO/" 

186 


■o n c a 




*3 1 

O -L 




110 

X X. \J 


Q 


26529 


237 


2 


45366 


423 


Histogram 
Difference 
(L2) 


100 


5023 


41 


15 


22857 


203 


5 


40676 


382 


1 


58259 


568 


Chi Square 
Test 


568 


669 


1 


91 


51012 


477 


11 


57746 


558 


1 


58259 


568 


Bin-Wise 
Histogram 
Inter- 
section 


568 


669 


1 


568 


669 


1 


178 


6648 


64 


14 


24671 


219 



S s a 



* Table I summarizes superhistogram families for various 

s 

0 thresholds and histogram difference methods for one selected 

Ln 

sfJ television program (i.e., one episode of the Seinfeld television 

1 : z 

P program). In Table I, the letter "A" designates the number of 
families formed. The letter "B" designates the duration of the 
longest family in frames. The letter "C" designates the number of 
keyframes in the longest family. 
10 As more fully described in the article "Color Super Histograms 

for Video Representation," by modifying the threshold for the 
histogram distance measure the superhistogram method can produce a 
desired number of families (i.e., clusters) of keyframes. The 
number can be selectively varied in order to obtain a "compact" 
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visual summary. 

For example, assume that it is desired to obtain five (5) 
frames representing five (5) families from the superhistogram of 
the episode of the Seinfeld television program. Then a threshold of 
5 fifty percent (50%) and the L2 distance measure can be used. The 
number five (5) is located in column A under the fifty percent 
(50%) threshold for the L2 distance measure in Table I. For 
another example, assume that it is desired to obtain two (2) frames 
™ representing two (2) families from the superhistogram of the 

IQrj episode of the Seinfeld television program. Then a threshold of 

m 

fh seventy five percent (75%) and the LI distance measure can be used. 

ill 

jj The number two (2) is located in column A under the seventy five 
3 S percent (75%) threshold for the LI distance measure (or for the 
m Histogram Intersection) in Table I. 
15ji Controller 130 executes keyframe selection application 250 to 

select representative keyframe images for each superhistogram. 
The selected representative keyframe images can be selected from 
either (1) the first image in the family histogram, or (2) the most 
meaningful image in the superhistogram, or (3) a randomly chosen 
20 image or an image that is closest to the cluster (family) center. 
The term "meaningful image" may refer to a frame with a person's 
face, an important text, etc. Visual summary application 260 then 
creates a compact visual summary using the selected keyframe 
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images . 

After visual summary application 260 has completed its 
operations, controller 130 stores the resulting compact visual 
summary in a visual summary storage location 270 in memory unit 
120. Visual summary retrieval module 180 is capable of retrieving 
a compact visual summary that is stored in memory unit 12 0 and 
causing the retrieved compact visual summary to be displayed in the 
manner previously described. 

In response to a user request, controller 130 is capable of 
accessing selected portions of video material summarized by the 
compact visual summary. The selected portions of video material 
are displayed by video processor 110. To access the video material 
controller 130 receives a user request that identifies and selects 
a keyframe image. Controller 130 then retrieves a compact visual 
summary from memory unit 12 0 that contains the selected keyframe 
image. Controller 130 uses the compact visual summary to access 
(i.e., identify the location of) the corresponding portion of the 
video material. Controller 130 then sends the location information 
of the video material to video processor 110. Video processor 110 
then displays the selected portion of the video material. 

In response to a user request, controller 130 is also capable 
of using a compact visual summary to assemble selected portions of 
summarized video material to form new video material. To create 
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the new video material controller 13 0 receives a user request that 
identifies and selects keyframe images. Controller 130 then 
retrieves a compact visual summary from memory unit 12 0 that 
contains the selected keyframe images. Controller 130 uses the 
5 compact visual summary to access (i.e., identify the location of) 
the corresponding portions of the video material. Controller 130 
then assembles the location information into a new arrangement as 
specified by the user. The location information arranges the 
™ selected portions of video material into new video material. 
ldE Controller 130 then sends the location information of the 

s H 3 

}i~ individual selected portions of the new video material to video 
processor 110 . Video processor 110 then displays the new video 
y material . 

7i* FIGURE 4 illustrates a flow diagram showing an advantageous 

ISn embodiment of the method of the present invention. The steps of 
^ the method are collectively referred to with the reference numeral 
400. Controller 130 receives keyframes from video processor 110 
(step 405) . Controller 130 then extracts frame signatures from the 
keyframes and filters the keyframes (step 410) . Controller 130 
2 0 then derives color information from the filtered keyframes 
(step 415) . 

Controller 13 0 then derives superhistograms from the 
color information (step 420) . Controller 130 then selects 
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a representative keyframe or a representative set of multiple 
keyframes for each family histogram (step 425) . Controller 130 
then creates a compact visual summary from the selected keyframe 
images (step 430) . Controller 130 then stores the compact visual 
5 summary in a visual summary storage location 270 within memory unit 
120 (step 435) . When requested by a user, visual summary retrieval 
module 180 retrieves a visual summary from memory unit 120 and 
causes it to be displayed (step 440) . 

While the present invention has been described in detail with 
10?: respect to certain embodiments thereof , those skilled in the art 
~ should understand that they can make various changes, substitutions 
^ modifications, alterations, and adaptations in the present 
^ invention without departing from the concept and scope of the 

5 

invention in its broadest form. 
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